Skip to main content

Overview

Phase 4 is the final enrichment stage where the base JSON from Phase 3 is modified in-place by 5 specialized scripts. Each script adds specific fields to complete the 86-field schema.
Order matters! Scripts must run sequentially:
  1. advanced_metrics_processor.py (needs OHLCV)
  2. process_earnings_performance.py (needs filings + OHLCV)
  3. enrich_fno_data.py (needs external FNO data)
  4. process_market_breadth.py (needs returns + SMA status)
  5. add_corporate_events.py (MUST BE LAST — adds event markers + news)

Execution Order

Phase 4 runs 5 scripts sequentially:
1

Advanced Metrics Injection

advanced_metrics_processor.py — Injects ADR, RVOL, ATH, Turnover metrics
2

Earnings Performance Injection

process_earnings_performance.py — Injects post-earnings returns
3

F&O Data Injection

enrich_fno_data.py — Injects F&O flag, lot size, next expiry
4

Market Breadth Processing

process_market_breadth.py — Generates sector analytics and breadth metrics
5

Corporate Events Injection (FINAL)

add_corporate_events.py — Injects event markers, announcements, news feed

Script 1: advanced_metrics_processor.py

Purpose

Injects OHLCV-derived metrics: ADR, RVOL, ATH, Turnover, Volume EMA.

Input Files

all_stocks_fundamental_analysis.json   (Phase 3 output)
ohlcv_data/{SYMBOL}.csv                (Phase 2.5 output)
complete_price_bands.json              (Phase 2 output)

Processing Logic

df = pd.read_csv(f"ohlcv_data/{symbol}.csv")

# Daily range percentage
df['Daily_Range_Pct'] = ((df['High'] - df['Low']) / df['Low']) * 100

# Moving averages of ADR
adr_5 = df['Daily_Range_Pct'].tail(5).mean()
adr_14 = df['Daily_Range_Pct'].tail(14).mean()
adr_20 = df['Daily_Range_Pct'].tail(20).mean()
adr_30 = df['Daily_Range_Pct'].tail(30).mean()

Fields Injected

FieldDescriptionExample
RVOLRelative Volume (today vs 20D avg)1.45
5 Days MA ADR(%)5-day average daily range3.2
14 Days MA ADR(%)14-day average daily range3.5
20 Days MA ADR(%)20-day average daily range3.4
30 Days MA ADR(%)30-day average daily range3.6
% from ATHDistance from all-time high-12.5
ATH_ValueAll-time high price2850.00
Gap Up %Today’s gap vs yesterday close1.2
Day Range(%)Today’s high-low spread2.8
6 Month Returns(%)6-month price return18.5
% from 52W LowDistance from 52-week low72.8
30 Days Average Rupee Volume(Cr.)30-day avg turnover1250.5
Daily Rupee Turnover 20(Cr.)20-day avg turnover1180.2
Daily Rupee Turnover 50(Cr.)50-day avg turnover1120.8
Daily Rupee Turnover 100(Cr.)100-day avg turnover1050.3
200 Days EMA Volume200-day EMA of volume12500000
% from 52W High 200 Days EMA VolumeVolume EMA trend-8.5

Threading

  • Workers: 10 concurrent threads
  • Typical Time: ~1-2 minutes (reading 2,775 CSV files)

Dependency on OHLCV

If FETCH_OHLCV = False in Phase 2.5, all these fields will remain 0.

Script 2: process_earnings_performance.py

Purpose

Injects post-earnings price performance metrics.

Input Files

all_stocks_fundamental_analysis.json   (Phase 3 output, modified by Script 1)
company_filings/{SYMBOL}_filings.json  (Phase 2 output)
ohlcv_data/{SYMBOL}.csv                (Phase 2.5 output)

Processing Logic

for filing in company_filings:
    caption = filing.get("caption", "").lower()
    if "quarterly" in caption and "results" in caption:
        earnings_date = datetime.strptime(filing["news_date"], "%Y-%m-%d")
        break

Fields Injected

FieldDescriptionExample
Quarterly Results DateDate of latest earnings filing2026-02-15
Returns since Earnings(%)% change from pre-earnings close to current8.5
Max Returns since Earnings(%)Peak % gain since earnings12.3

Typical Time

~2-3 minutes — Reading 2,775 filing JSONs + CSV lookups

Script 3: enrich_fno_data.py

Purpose

Injects F&O (Futures & Options) metadata: lot size, next expiry, F&O flag.

Input Files

all_stocks_fundamental_analysis.json   (Phase 3 output, modified by Scripts 1-2)
fno_lot_sizes_cleaned.json             (External standalone script)
fno_expiry_calendar.json               (External standalone script)

Processing Logic

for stock in master_data:
    symbol = stock["Symbol"]
    
    # Check if symbol is in F&O list
    stock["Is FNO"] = 1 if symbol in lot_map else 0

Fields Injected

FieldDescriptionExample
Is FNO1 if F&O enabled, 0 otherwise1
FNO Lot SizeContract lot size250
Next ExpiryNext futures expiry date2026-03-27

Typical Time

~10-20 seconds — Simple JSON lookups

Script 4: process_market_breadth.py

Purpose

Generates sector-level analytics and relative strength ratings.

Input Files

all_stocks_fundamental_analysis.json   (Phase 3 output, modified by Scripts 1-3)

Processing Logic

sector_stats = {}

for stock in master_data:
    sector = stock.get("Sector")
    if sector not in sector_stats:
        sector_stats[sector] = {
            "above_sma_50": 0,
            "above_sma_200": 0,
            "total_stocks": 0
        }
    
    sector_stats[sector]["total_stocks"] += 1
    
    if "Above" in stock.get("SMA Status", "") and "SMA 50" in stock.get("SMA Status", ""):
        sector_stats[sector]["above_sma_50"] += 1
    
    if "Above" in stock.get("SMA Status", "") and "SMA 200" in stock.get("SMA Status", ""):
        sector_stats[sector]["above_sma_200"] += 1

Output Files

FileDescription
sector_analytics.jsonSector-level breadth metrics
market_breadth.csvDaily market breadth snapshot

Typical Time

~20-30 seconds — In-memory calculations

Script 5: add_corporate_events.py (CRITICAL FINAL STEP)

Purpose

MUST BE LAST! Injects event markers, regulatory announcements, and news feed.

Input Files

all_stocks_fundamental_analysis.json       (Phase 3 output, modified by Scripts 1-4)
upcoming_corporate_actions.json            (Phase 2 output)
company_filings/{SYMBOL}_filings.json      (Phase 2 output)
market_news/{SYMBOL}_news.json             (Phase 2 output)
nse_asm_list.json                          (Phase 2 output)
nse_gsm_list.json                          (Phase 2 output)
bulk_block_deals.json                      (Phase 2 output)
incremental_price_bands.json               (Phase 2 output)

Event Marker Logic

with open(asm_file, "r") as f:
    asm_data = json.load(f)

for item in asm_data:
    symbol = item.get("Symbol")
    stage = item.get("Stage", "")
    
    if "LTASM" in stage:
        add_event(symbol, "★: LTASM")
    elif "STASM" in stage:
        add_event(symbol, "★: STASM")

Announcements Injection

# Top 5 regulatory filings
filings = load_filings(symbol)[:5]

announcements = []
for filing in filings:
    announcements.append({
        "Date": filing.get("news_date"),
        "Headline": filing.get("caption"),
        "URL": filing.get("pdf_url")
    })

stock["Recent Announcements"] = announcements

News Feed Injection

# Top 5 news items with sentiment
news_items = load_news(symbol)[:5]

news_feed = []
for news in news_items:
    news_feed.append({
        "Title": news.get("title"),
        "Sentiment": news.get("sentiment"),  # positive/negative/neutral
        "Date": news.get("timestamp")
    })

stock["News Feed"] = news_feed

Fields Injected

FieldDescriptionExample
Event MarkersArray of event strings["★: LTASM", "💸: Dividend (15-Mar)", "📦: Block Deal"]
Recent AnnouncementsTop 5 regulatory filings[{"Date": "2026-02-15", "Headline": "Quarterly Results", "URL": "..."}]
News FeedTop 5 news items[{"Title": "Stock hits 52W high", "Sentiment": "positive", "Date": "2026-03-01"}]

Event Marker Icons Reference

IconNameTrigger Condition
SurveillanceStock in ASM/GSM lists
📊Results Recently OutResults filed in last 7 days
🔑Insider TradingSEBI Reg 7(2) / Form C in last 15 days
📦Block DealBulk/Block deal in last 7 days
#Circuit RevisionPrice band changed
Results UpcomingResults due in next 14 days
💸DividendDividend ex-date in next 30 days
🎁BonusBonus ex-date in next 30 days
✂️SplitSplit ex-date in next 30 days
📈RightsRights issue in next 30 days

Typical Time

~3-5 minutes — Reading 2,775 filing JSONs + 2,775 news JSONs + event logic

Phase 4 Output Summary

Final Master JSON

📦 Phase 4 Final Output:
└─ all_stocks_fundamental_analysis.json   (~55 MB, 2,775 records, 86 COMPLETE fields)

Field Completion Status

✅ 60 fields populated (fundamentals, technicals, ratios)
❌ 26 fields placeholder (0 or empty arrays)

Total Phase 4 Execution Time

~6-10 minutes (sum of all 5 scripts)

Breakdown:

  • Script 1 (ADR/RVOL/ATH): ~2 min
  • Script 2 (Earnings): ~3 min
  • Script 3 (F&O): ~20 sec
  • Script 4 (Breadth): ~30 sec
  • Script 5 (Events): ~4 min

Critical Sequencing

Why order matters:
  1. advanced_metrics_processor.py first — Needs raw OHLCV files
  2. process_earnings_performance.py second — Needs filings + OHLCV
  3. enrich_fno_data.py third — Independent lookup
  4. process_market_breadth.py fourth — Needs returns + SMA status from previous scripts
  5. add_corporate_events.py LAST — Adds final UI elements (markers, news)
Running out of order will cause missing data or overwrites.

Error Handling

Phase 4 uses soft failure mode:
results["advanced_metrics_processor.py"] = run_script("advanced_metrics_processor.py", "Phase 4")
# Pipeline continues even if enrichment fails

Impact of Failures

  • Script 1 fails: ADR, RVOL, ATH remain 0
  • Script 2 fails: Earnings performance fields remain null
  • Script 3 fails: F&O fields remain N/A
  • Script 4 fails: No sector analytics, no RSR
  • Script 5 fails: Event markers, announcements, news feed remain empty

Next Phase

Once Phase 4 completes, the pipeline proceeds to:

Pipeline Architecture

See complete pipeline overview including Phase 5 compression details

Validation Checklist

After Phase 4, verify:
# 1. Check file size (should be ~55 MB)
ls -lh all_stocks_fundamental_analysis.json

# 2. Validate field count
jq '.[0] | keys | length' all_stocks_fundamental_analysis.json  # Expected: 86

# 3. Check sample stock has all fields populated
jq '.[0]' all_stocks_fundamental_analysis.json | grep -E '(RVOL|Event Markers|Recent Announcements)'

# 4. Count stocks with event markers
jq '[.[] | select(."Event Markers" | length > 0)] | length' all_stocks_fundamental_analysis.json